notebook.community

Edit and run

Let's say we want to prepare data and try some scalers and classifiers for prediction in a classification problem. We will tune paramaters of classifiers by grid search technique.

Data preparing:



In [1]:

    
from sklearn.datasets import make_classification


X, y = make_classification()

Setting steps for our pipelines and parameters for grid search:



In [2]:

    
from reskit.core import Pipeliner

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC


classifiers = [('LR', LogisticRegression()),
               ('SVC', SVC())]

scalers = [('standard', StandardScaler()),
           ('minmax', MinMaxScaler())]

steps = [('scaler', scalers),
         ('classifier', classifiers)]

param_grid = {'LR': {'penalty': ['l1', 'l2']},
              'SVC': {'kernel': ['linear', 'poly', 'rbf', 'sigmoid']}}

Setting a cross-validation for grid searching of hyperparameters and for evaluation of models with obtained hyperparameters.



In [3]:

    
from sklearn.model_selection import StratifiedKFold


grid_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
eval_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)

Creating a plan of our research:



In [4]:

    
pipe = Pipeliner(steps=steps, grid_cv=grid_cv, eval_cv=eval_cv, param_grid=param_grid)
pipe.plan_table









    Out[4]:






  
    
      
      scaler
      classifier
    
  
  
    
      0
      standard
      LR
    
    
      1
      standard
      SVC
    
    
      2
      minmax
      LR
    
    
      3
      minmax
      SVC

To tune parameters of models and evaluate this models, run:



In [5]:

    
pipe.get_results(X, y, scoring=['roc_auc'])









    



Line: 1/4
Line: 2/4
Line: 3/4
Line: 4/4






    Out[5]:






  
    
      
      scaler
      classifier
      grid_roc_auc_mean
      grid_roc_auc_std
      grid_roc_auc_best_params
      eval_roc_auc_mean
      eval_roc_auc_std
      eval_roc_auc_scores
    
  
  
    
      0
      standard
      LR
      0.956
      0.0338231
      {'penalty': 'l1'}
      0.968
      0.0324962
      [ 0.92  1.    1.    0.94  0.98]
    
    
      1
      standard
      SVC
      0.962
      0.0278568
      {'kernel': 'poly'}
      0.976
      0.0300666
      [ 0.95  1.    1.    0.93  1.  ]
    
    
      2
      minmax
      LR
      0.964
      0.0412795
      {'penalty': 'l1'}
      0.966
      0.0377359
      [ 0.92  1.    1.    0.92  0.99]
    
    
      3
      minmax
      SVC
      0.958
      0.0411825
      {'kernel': 'rbf'}
      0.962
      0.0401995
      [ 0.93  1.    1.    0.9   0.98]

	scaler	classifier	grid_roc_auc_mean	grid_roc_auc_std	grid_roc_auc_best_params	eval_roc_auc_mean	eval_roc_auc_std	eval_roc_auc_scores
0	standard	LR	0.956	0.0338231	{'penalty': 'l1'}	0.968	0.0324962	[ 0.92 1. 1. 0.94 0.98]
1	standard	SVC	0.962	0.0278568	{'kernel': 'poly'}	0.976	0.0300666	[ 0.95 1. 1. 0.93 1. ]
2	minmax	LR	0.964	0.0412795	{'penalty': 'l1'}	0.966	0.0377359	[ 0.92 1. 1. 0.92 0.99]
3	minmax	SVC	0.958	0.0411825	{'kernel': 'rbf'}	0.962	0.0401995	[ 0.93 1. 1. 0.9 0.98]